Systems and methods for recognizability of objects in a multi-layer display

ABSTRACT

A method, system, and computer-readable media of generating a display on a device, including combining content from a plurality of sources into a display, the content from each of the plurality of sources being presented as a layer of the display, and further, each layer of the display being of substantially the same dimensions, detecting one or more objects in each layer of the generated display, and for one or more of the detected objects determining an object type or classification, determining if the object is overlapping or obscuring an object in a different layer of the generated display, and determining if the object will appear to a viewer as if it will overlap or obscure an object in a different layer of the generated display as a result of the motion, orientation, or gaze of the viewer.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation of U.S. Application No.17/708,656, filed Mar. 30, 2022, which claims priority to U.S.Provisional Application No. 63/248,800, filed Sep. 27, 2021, and U.S.Provisional Application No. 63/222,757, filed Jul. 16, 2021, the entirecontent of each of which is incorporated by reference herein in itsentirety for all purposes. U.S. Application No. 17/675,950, filed Feb.18, 2022, and U.S. Application No. 17/675,975, filed Feb. 18, 2022, arealso incorporated by reference herein in their entirety for allpurposes.

BACKGROUND

Enabling a person to effectively understand and interact with displayedcontent is important in many situations. However, as the types ofcontent and the complexity of information increase, single-layerdisplays become more cluttered with objects and less effective atcommunicating information and assisting users to perform tasks.

Embodiments of the disclosure are directed to overcoming this and otherdisadvantages of previous approaches.

SUMMARY

The terms “invention,” “the invention,” “this invention,” “the presentinvention,” “the present disclosure,” or “the disclosure” as used hereinare intended to refer broadly to all the subject matter described inthis document, the drawings or figures, and to the claims. Statementscontaining these terms should be understood not to limit the subjectmatter described herein or to limit the meaning or scope of the claims.Embodiments covered by this disclosure are defined by the claims and notby this summary. This summary is a high-level overview of variousaspects of the disclosure and introduces some of the concepts that arefurther described in the Detailed Description section below. Thissummary is not intended to identify key, essential or required featuresof the claimed subject matter, nor is it intended to be used inisolation to determine the scope of the claimed subject matter. Thesubject matter should be understood by reference to appropriate portionsof the entire specification, to any or all figures or drawings, and toeach claim.

The present disclose is directed to a method, system, andcomputer-readable media of generating a display on a device, includingcombining content from a plurality of sources into a display, whereinthe content from each of the plurality of sources is presented as alayer of the display, and further, wherein each layer of the display isof substantially the same dimensions; detecting one or more objects ineach layer of the generated display; and for one or more of the detectedobjects determining an object type or classification; determining if theobject is overlapping or obscuring an object in a different layer of thegenerated display; determining if the object will appear to a viewer asif it will overlap or obscure an object in a different layer of thegenerated display as a result of the motion, orientation, or gaze of theviewer; and based on the object’s type or classification, adetermination that the object is overlapping or obscuring an object in adifferent layer of the generated display, or a determination that theobject will appear to a viewer as if it will overlap or obscure anobject in a different layer, modifying a characteristic of the objectbased on a rule or trained model.

Other objects and advantages of the systems and methods described willbe apparent to one of ordinary skill in the art upon review of thedetailed description and the included figures. Throughout the drawings,identical reference characters and descriptions indicate similar, butnot necessarily identical, elements. While the exemplary embodimentsdescribed herein are susceptible to various modifications andalternative forms, specific embodiments have been shown by way ofexample in the drawings and will be described in detail herein. However,the exemplary embodiments described herein are not intended to belimited to the forms disclosed. Rather, the present disclosure coversall modifications, equivalents, and alternatives falling within thescope of the appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the system and methods in accordance with the presentdisclosure will be described with reference to the drawings, in which:

FIG. 1 is a diagram illustrating a typical display generated by aconventional process for a video conference application;

FIG. 2 is a diagram illustrating certain of the concepts involved in anembodiment of the disclosed system and methods;

FIG. 3 is a diagram illustrating how a pixel in a layer of a multiplelayer display may be “defined” by or associated with a three-dimensionalcoordinate system;

FIG. 4 is a diagram illustrating an example of a display screen (such asthe display of a computing device) on which are generated and displayedmultiple overlapping layers, in accordance with some embodiments;

FIG. 5 is a diagram illustrating an example of multiple video sourcesbeing combined or merged to form a multi-layer display; in this example,Layer 0 is a capture of a computer desktop, Layer -1 is a capture of awebcam video feed, and Layer 1 is a capture of a live video streamingfeed;

FIG. 6 is a flow chart or flow diagram illustrating a method, process,operation, or set of functions that may be used in implementing anembodiment of the disclosure; and

FIG. 7 is a diagram illustrating elements or components that may bepresent in a computer device or system configured to implement a method,process, function, or operation in accordance with an embodiment of thesystem and methods described herein.

FIG. 8 is an example of Transparent Computing.

Note that the same numbers are used throughout the disclosure andfigures to reference like components and features.

DETAILED DESCRIPTION

The subject matter of embodiments of the present disclosure is describedherein with specificity to meet statutory requirements, but thisdescription is not intended to limit the scope of the claims. Theclaimed subject matter may be embodied in other ways, may includedifferent elements or steps, and may be used in conjunction with otherexisting or later developed technologies. This description should not beinterpreted as implying any required order or arrangement among orbetween various steps or elements except when the order of individualsteps or arrangement of elements is explicitly noted as being required.

Embodiments of the disclosure will be described more fully herein withreference to the accompanying drawings, which form a part hereof, andwhich show, by way of illustration, exemplary embodiments by which thedisclosure may be practiced. The disclosure may, however, be embodied indifferent forms and should not be construed as limited to theembodiments set forth herein; rather, these embodiments are provided sothat this disclosure will satisfy the statutory requirements and conveythe scope of the disclosure to those skilled in the art.

One approach that may reduce clutter and more effectively communicateinformation is to display objects in different layers by constructing amulti-layer display. However, when viewing a display which includesmultiple layers with different objects in each layer, an object or textin one layer may be difficult to discern if it is even partiallyobscured by an object or text in another layer. Further, when there aremultiple objects from multiple sources composited and presentedtogether, a human or computer desiring to select one or more suchobjects may not be able to in an effective way.

Thus, in some embodiments, one or more of the disclosed functions andcapabilities may be used to enable a form of “touchless computing”wherein a user’s gaze, gestures, movements, position, orientation, orother characteristics observed by a camera are used as the basis forselecting objects and executing processes on a computing device.Further, because the opacity of pixels in different layers may beadjusted to improve viewing and recognizability of objects in one ormore of the displayed layers of a multi-layer display, this may also beconsidered a form of “transparent computing.” This type of computingexperience may include the presentation of a plurality of objects orcontent distributed among multiple layers. Transparent computing mayalso include the ability of a user to interact with a 3-dimensionalenvironment, for example by introducing an image of an object into alayer and interacting with the object as part of performing a task.

This disclosure is directed to systems, devices, and methods formulti-layer display. In some embodiments, the systems and methodsdescribed herein may be used to improve the recognizability of objectsin different layers of a multi-layer display, where recognizability asused herein refers to the ability of a user to identify, select, orinteract with an object in one layer of a multi-layer display. As willbe described, the recognizability of an object can be reduced by asituation in which an object in one layer is obscured by or overlaps(either partially or completely) an object in another layer of themulti-layer display. The recognizability of an object in one layer canalso be reduced when it or an object in another layer is re-positionedby a user or application. Embodiments improve the recognizability ofobjects that may be presented in different layers, thereby enabling auser or a computing device to interact with those objects moreeffectively, such as by selecting an intended object. In someembodiments, improving the recognizability of an object may involvechanges to appearance, resolution, contrast, position, or other aspectsof an object relative to those of an object in a different layer of amulti-layer display.

Embodiments are directed to solutions to the described limitations inthe presentation and use of multi-layer displays, particularly thoseformed from composite sources. These solutions include (1) a method todynamically re-arrange the position(s) of one or more visual objects,(2) a method to dynamically adjust the relative visual attributes of oneor more visual objects in a layer or layers of a multi-layer display(including but not limited to brightness, contrast, color, opacity,resolution, etc.), and (3) a method that dynamically adjusts bothpositions and attributes of objects. As described and referred toherein, a “visual object” is comprised of one or more pixels in adigital graphical format (e.g., digital picture, digital video, or videoframe buffer) that represent an element or construct to a human and/orto a machine.

In one embodiment, the disclosure is directed to a method for amulti-layer display that prevents the apparent overlap (and in somecases, the possible overlap arising from user or application actions)between objects in different layers of a display, such as theobscuration (either partially or completely) of an object in one layerby an object in another layer of the display. As will be described, theapparent or possible overlap (sometimes referred to as an occlusion orblockage) of one object by another may be the result of an object’splacement in a layer relative to an object in another layer, an object’smotion, or a viewer’s perspective. In one embodiment, the dataprocessing flow and associated logic implemented as part of the methodmay comprise:

-   Detect objects in one or more layers of a display;    -   In some embodiments, this may be accomplished by accessing and        inspecting a video frame buffer of a source (such as a camera,        computing device, etc.) of the objects or content displayed in        each layer;    -   In some embodiments, this may be accomplished by use of image        recognition, a trained model, or computer vision techniques to        identify and classify an object in a layer of a multi-layer        display formed from a composite of multiple sources;-   Once the objects in each of the layers formed from a composite of    sources have been detected, the method may determine the type or    category of each object, where the type or category may be one of,    but is not limited to “alphabetic text,” “geometric shape,” “window    of an open, running application,” “human hand, face, or body,” etc.;    -   In one embodiment, this determination may be performed by use of        a trained model. For example, a trained machine learning (ML)        model may be generated by “showing” it a sufficient number of        correctly labeled examples (e.g., multiple variations of images        of a human hand, labeled as “human hand”). The trained model is        then able to recognize a human hand (or other object it has been        trained to classify) in new data that it has not “seen” before;        -   In one embodiment, such a trained model may be in the form            of a convolutional neural network (CNN) that can be used to            identify and classify an object as to its type or category;-   Next, the method may determine how to further process one or more of    the detected objects, based on the identified type or classification    of the object (such as human hand, head, animal, active application    window, etc.);    -   As an example, in one embodiment, this processing may be        determined by evaluating each of a set of rules, and        implementing the result of the rules that are satisfied;        -   For example, if a human profile is detected, then the method            may extract the human profile from the remainder of a webcam            video feed (“background subtraction”) and replace it with a            digital version. One reason for a digital replacement may be            to re-insert one or more of digital replacements back into            the composite, with adjustments to color, brightness,            contrast, size, or resolution to improve the ability of a            user to recognize and interact with the object;            -   This type of processing may be performed to address a                “limitation of view” problem so that the human profile                is no longer “lost” in the busyness of the composite                graphics, but instead is more noticeable in the “scene,”                thereby improving communication and understanding of the                content;        -   Similarly, if a visual object is determined to be alphabetic            text, then the method may re-introduce one or more digital            copies of the text back into the composite graphics, with            adjustments to color, brightness, or contrast. This type of            processing may be performed so that the presented text is            easier to read and comprehend for a viewer;        -   In some embodiments, the method may not only change the            visual attributes of one or more objects (such as color,            brightness, resolution, or contrast), but may also (or            instead) automatically move the position, size, and/or            orientation of one or more of the objects;            -   As an example, if a presenter’s eye in a video image is                occluding one or more objects on a presentation slide,                then the method may automatically change the position,                size, and/or orientation of one or more objects such                that they no longer occlude one another or are occluded                by the presenter’s eye, thereby improving the                effectiveness and utility of the multi-layer display;-   As part of the processing of an object based on its type or category    (or as an additional processing step), an object may be made    selectable or activate-able by a user or machine to enable a user or    machine to cause an operation to be performed on an object (such as    to move or re-position it) or to execute an operation associated    with the object (such as to initiate a payment processing or    document retrieval operation);    -   Examples of such executable operations include but are not        limited to:        -   Initiation of a payment transaction;        -   Activation of a link, resulting in accessing a data storage            location or website;        -   Retrieval of data or a document from a specified storage            location (such as a file, folder, etc.);        -   Transfer of data or a document to a desired location or            user; or        -   Launching of a specific application; or        -   Generating a message;    -   Similarly, in some cases, an object may be made not selectable        by a user or machine;-   As part of the processing of an object based on its type or category    (or as an additional processing step), an evaluation of whether an    object in a first layer is obscured by (or is obscuring) an object    in a second layer may be performed with the result of the evaluation    used as an input to the processing of the object based on type or    category (or as a separate operation) to reduce or remove the    obscuration or overlap;-   As part of the processing of an object based on its type or category    (or as an additional processing step), an evaluation of whether an    object in a first layer is likely to visually overlap or occlude an    object in a second layer may be performed with the result of the    evaluation used as an input to the processing of the object based on    type or category (or as a separate operation) to reduce or remove    the likelihood or degree of a possible overlap of one object by    another;    -   In some embodiments, this may include determining the likelihood        of obscuration or occlusion based on a user’s gaze, angle of        view, etc.;    -   In some embodiments, this may include determining that a        possible overlap or occlusion would be viewed by a user because        of the user’s position, orientation, motion, gaze, or other        aspect;-   In some embodiments, the method may dynamically adjust both visual    attributes (color, brightness, contrast, etc.) and position, size,    or orientation of a detected object.

In some embodiments, the described adjustments may be performedone-time. In some embodiments, the adjustments may be performedcontinuously and substantially in real time. In some embodiments, themethod’s rules, models, or algorithms may be intended to determine anadjustment approach most likely to improve the recognizability of andinteraction with a set of objects, thereby maximizing the effectivenessof communication and understanding of content. In this context,“maximizing communication” refers to adjusting characteristics orpositions of objects to overcome or compensate for one or more of“limitations of view” and “limitations of intent” as those concerns aredescribed herein.

In some embodiments, processing techniques may be used thatautomatically adjust a scene and include tools or user interfaceelements to allow humans and/or computers to determine when and how toadjust the characteristics and/or position of objects.

In some embodiments, a method may adjust a layer’s opacity on either asingle pixel, an object, or an entire layer to maximize communicationand understanding of content.

In one embodiment, the disclosure is directed to a system formulti-layer display that prevents the overlap or occlusion of an objectin one layer of a multi-layer display by an object in a different layerof the multi-layer display. In one embodiment, this may includedetermining that an object in a first layer may be partially or fullyobscured by an object in a different layer because of a viewer’sposition, gaze, orientation, motion, or action. The system may include aset of computer-executable instructions and an electronic processor orco-processors. When executed by the processor or co-processors, theinstructions cause the processor or co-processors (or a device of whichthey are part) to perform a set of operations that implement anembodiment of the disclosed method or methods.

In one embodiment, the disclosure is directed to a set ofcomputer-executable instructions, wherein when the set of instructionsare executed by an electronic processor or co-processors, the processoror co-processors (or a device of which they are part) perform a set ofoperations that implement an embodiment of the disclosed method ormethods.

Among other things, the present disclosure may be embodied in whole orin part as a system, as one or more methods, or as one or more devices.Embodiments of the disclosure may take the form of a hardwareimplemented embodiment, a software implemented embodiment, or anembodiment combining software and hardware aspects. For example, in someembodiments, one or more of the operations, functions, processes, ormethods described herein may be implemented by one or more suitableprocessing elements (such as a processor, microprocessor, CPU, GPU, TPU,controller, etc.) that is part of a client device, server, networkelement, remote platform (such as a SaaS platform), an “in the cloud”service, or other form of computing or data processing system, device,or platform.

The processing element or elements may be programmed with a set ofexecutable instructions (e.g., software instructions), where theinstructions may be stored on (or in) one or more suitablenon-transitory data storage elements. In some embodiments, the set ofinstructions may be conveyed to a user through a transfer ofinstructions or an application that executes a set of instructions (suchas over a network, e.g., the Internet). In some embodiments, a set ofinstructions or an application may be utilized by an end-user throughaccess to a SaaS platform or a service provided through such a platform.

In some embodiments, one or more of the operations, functions,processes, or methods described herein may be implemented by aspecialized form of hardware, such as a programmable gate array,application specific integrated circuit (ASIC), or the like. Note thatan embodiment of the inventive methods may be implemented in the form ofan application, a sub-routine that is part of a larger application, a“plug-in”, an extension to the functionality of a data processing systemor platform, or other suitable form. The following detailed descriptionis, therefore, not to be taken in a limiting sense.

In some displays to which the disclosed approach may be applied, objectsand text may be displayed to a user in different layers of an overalldisplay. In such a multi-layer display, a first layer may display oneset of objects or text from a first source and a second layer at leastpartially visible through the first layer may display a second set ofobjects or text from a second source. Such a multi-layered display mayenable a user or users to visually experience a sense of depth betweenobjects and/or text, or to better interpret a set of objects as beingpart of a group. The objects or text presented in a layer may initiallybe determined and positioned independently of those in another layer,such as by a different application or source. The techniques and methodsdisclosed herein may be used to improve the recognizability of objectsin the different layers and assist a viewer to interact with the objectsmore effectively, thereby improving the understanding of the presentedcontent.

The system and methods described in one or more U.S. Patent Applicationsassigned to the assignee of the present application introduce a mode ofcombining visual content wherein multiple layers of objects may overlap.This capability may create additional issues that require addressing toassist users. For example, an object (or text) in one layer of acomposite display of layers based on different sources may partially orcompletely obscure an object in another layer, either as originallypositioned or after selection and movement by a user. It is alsopossible that an object (or text) in one layer may be caused to move bya user or an application and in doing so, appear to obscure or occludean object or text in another layer. In either situation, a user orcomputing device may become confused and unable to accomplish a task orperform a function they were planning to perform.

It is noted that this “problem” exists because of the underlying systemsand methods used to generate a multi-layer display. Conventionaldisplays and display methods do not layer content from video framebuffers on top of one another, and instead place them eitherside-by-side, picture-in-picture, or present them one-at-a-time.However, the systems and methods used in implementing embodiments of thedisclosure make it possible to display multiple content sourcessimultaneously on the same full screen. In contrast, conventionalapproaches require a user to choose to view someone’s webcam feed ortheir presentation slide during a video-conferencing meeting. Theability to view multiple sources of video content at the same time andin a full screen mode may cause the new and previously not encounteredissues or concerns that are addressed by the present disclosure.

When viewing digital content on a display which is a composite of one ormore digital graphics sources (e.g., one or more “video feeds,” and insome situations further combined with various windows of a runningdesktop or mobile-device application), the visual complexity or busynessof the composite may make it difficult for a human or machine to clearlyrecognize, interpret, and interact with displayed objects or content. Asan example, in a composite video of a person’s webcam combined/overlaidwith a slide presentation, bold shapes on a slide may visually “clash”with the person’s face from the webcam feed, making it difficult foranother human or computer to understand the intended meaning of thecomposite communication.

As another example, consider a math teacher presenting a virtuallecture, with her face from a webcam feed overlaid with a virtual“blackboard” where she is writing, solving, and explaining equations. Inthis situation, visual distractions (such as glare from a backgroundlamp) in the webcam feed may make it difficult for students to clearlysee the equations. These and similar scenarios are examples of wherecomposite video graphics may interact and/or obscure each other inunintended ways, thereby limiting effective communication andunderstanding of content.

In addition to the previous descriptions of problems that may arise whenusing multi-layer displays (which focused on what may be termed“limitations of view”), there may also be problems or difficultiesarising from “limitations of intent.” When there are multiple objectsfrom multiple sources combined and presented together in an overalldisplay, a human or computer may not be able to effectively select oneor more such objects. As an example, if in a display of a presentationslide combined with a presenter’s webcam feed, the presenter intends toselect an object on the slide, the computer system may incorrectlyinterpret the action as selecting the presenter’s eye, which may befully or partially occluding the intended object on the slide.

Another problem in effectively interacting with a multi-layer displayarises where the “viewer” is a computer (or process, or other form ofdevice) and the computer is attempting to detect and interpret a visualobject. In most of the applications of “computer vision,” the techniquesare applied to single (i.e., non-composite) video feeds, such as acamera feed from a turnpike station where a computer vision algorithm isattempting to find and interpret a vehicle license plate. In the morecomplex scenario of a display formed from a composite of sources, it isoften difficult (if not unrealistic or infeasible) for a computeralgorithm to correctly detect and interpret visual objects, particularlywhen visual objects from one or more of the “layers” partially orcompletely occlude one another.

FIG. 1 is a diagram illustrating a typical display generated by aconventional process for a video conference or similar application. Asshown in the figure, an overall display 100 may comprise a single sourceof content (such as a live webcam video stream of a presenter or ascreen shared by the presenter) presented as the primary display 102,with other sources presented as thumbnails 104 (such as otherparticipants in a video conference) to the side of the primary displayarea.

As suggested by FIG. 1 , conventional approaches divide the viewing area100 into discrete sections and place different content into differentsections. For example, on a video conference call the presenter’sdesktop/presentation may appear as the central “main window” 102 andtheir webcam feed (or that of other participants) may be presented as asmaller window 104 on another part of the screen. Similarly, someconventional approaches use a “picture-in-picture” approach wherebydifferent sections of the screen are used to show different content.

In contrast, this disclosure is directed to an approach where multiplecontent sources are shown simultaneously on the same parts of thescreen, using transparency/opacity adjustments and blending techniquesto digitally merge content and enable users to distinguish objects. Thishas a benefit in that presenters and participants no longer need tochoose which screen area to focus on, but instead can watch the wholescreen and see all the content at once. Further, the content elementsmay be dynamically adjusted for transparency, opacity, size, position,color, contrast, resolution, and other properties to improverecognizability and thereby maximize clarity and understanding ofcontent.

FIG. 2 is a diagram illustrating certain of the concepts involved in anembodiment of the disclosed system and methods. As shown in the figure,content provided by a plurality of sources 202 may be combined or mergedand displayed on a screen 204. Sources may include but are not limitedto a server (local or remote), a video camera 205 connected to a user’sdesktop computer or mobile device and used to generate a video of theuser 210, or a camera (C) connected to another device. The generated ormerged display 204 may be presented on the user’s device and/or a deviceof other users. As suggested by the figure, display 204 may be comprisedof a plurality of layers 206, with each layer comprising a plurality ofpixels 208, typically arranged in rows and columns. As described, inembodiments of the multi-layer display disclosed herein, each layer mayrepresent content or objects from a different source and the layers areof substantially the same dimensions, for example a full screen.

A set of elements, components, and processes of the type shown in FIG. 2may be used to enable users to view and interact with displayed objectsand information. The interactions can be as part of a communicationexperience, a presentation, a gaming experience, an instructional oreducational experience, the monitoring of an event or location, a tourof a venue, the delivery of a service, or other experience in which aperson or machine views or interacts with a display and with objects inthe display. The displayed information can be an image, text, video, alink to content, an object, a selectable user interface element, orother form of information or content. As suggested by the figure, in oneexample, the displayed information may be obtained from multiple sources202, where the sources may include an end user’s device, a remote serverstoring content, or a camera that is part of a device. The camera may bepart of the end user’s device or of another device.

In some embodiments, an object may be identified or classified in alayer of a display or a frame of a video buffer using a computer vision(CV) technique. Computer vision techniques typically rely on imageprocessing algorithms that may first reduce the “color depth” of imagesand video streams, without a loss of important content or aspects of anitem of content. For example, an object that is an image of a flowerwill still be recognizable as a flower even if its color depth (palette)is reduced from a set of 16 million colors to 256 grayscale colors. Thisform of processing allows the application of a trained model, such as aconvolutional neural network (CNN) or other form of classifier to detectand classify objects in an image. The reduced color palette produces asignificant improvement in the performance of a CNN so that whenprocessing real-time video, the frames-per-second (FPS) rate can be keptsufficiently high enough, and hence feasible to use in a greater numberof situations.

In some embodiments, a memory may be accessed and examined to identifyoperations being executed by a computing device. Examples may includefinding an operating system (OS) window/application by calling the OSapplication programming interfaces (APIs) to get a list of all windows,their position, and their size, using the OS APIs to track where a useris moving the mouse, or finding objects in webpages by inspecting theHTML data that defines what a browser renders on a screen.

Once an object comprising a set of pixels is identified, the contents orappearance of individual pixels or a set of pixels may be adjusted oraltered with regards to color, shadow, contrast, opacity, size, shape,resolution, or another characteristic. Together, a set of pixels mayform a shape, text, object, or other characters. Each source of pixels(e.g., an executed application, a camera, a video frame buffer, a videofeed from an external source) provides content that is “merged” into asingle video/graphic display, with the result being displayed on a fullscreen by the operating system. In real-time and dynamically, as thesources are merged, a system of the type suggested by FIG. 2 can adjustthe relative transparency, opacity, amount of occlusion, orselect-ability of an object, pixel, or set of pixels. This provides theability to display a set of objects, images, video, etc. as a distinctlayer of a multi-layer display, with some objects varying intransparency, opacity, or other characteristic with respect to objectsin other layers.

As suggested, the display on a monitor of the end user’s device may begenerated in a manner to include multiple distinct layers, where eachlayer is comprised of a plurality of pixels and each layer representscontent obtained from one or more of an application, document, camera,other device, etc. The pixels in each layer may be adjusted with regardsto their transparency, opacity, or other characteristic independently ofother pixels in that layer or pixels in other layers. This permitselements or components of one layer to be viewed through other,overlying layers.

For example, one set of objects, text, or other elements may bepresented in a layer that is visible through an overlying layer thatappears to be placed on top of it. This may be accomplished by adjustingthe pixels in the upper or overlying layer so that they are at leastpartially transparent and permit viewing of certain of the pixels in thelower or underlying layer. The ability to adjust the relativetransparency (or viewability) of the pixels in one layer relative toanother layer permits a user to select and interact with multiple layers(and hence multiple sources) of content.

As mentioned, in one embodiment, a layer of the display may be generatedby accessing a video frame buffer of the end user’s computing device(such as a tablet, laptop, desktop, or smartphone). The accessed datamay include an object or element, where the object or element mayprovide a mechanism for accessing content to be integrated with a layeror layers of the multi-layer display generated by the end user’s device.The mechanism may be a recognizable shape or form and may include anidentifier, code, or metadata that may be used to access information,data, or content. The identifier, code, or metadata may direct the enduser’s device to a remote server, a database, or to information storedon the end user’s device. The accessed information, data, or content mayinclude both content and information that determines how to display thatcontent.

In some embodiments, data captured by a camera (such as an image of auser) may be subject to image processing and analysis to recognize andidentify objects or gestures, or to detect and evaluate motion (e.g., auser’s gaze or a user’s or object’s position changes, acceleration,orientation, etc.). In response, an application or process may alterwhat is displayed in one or more layers of the overall multi-layerdisplay viewed by a user or a camera. For example, a camera may capturea user making a gesture, and in response a layer of the display may bealtered to show the selection of a user interface element.

In another example, the perspective, position, or orientation of anobject or element displayed in a layer may be altered as a user turnstheir head. This may be based on tracking the position and orientationof the user’s head or eyes and using that to alter the way in which asource’s content is presented. In a related capability, because thecharacteristics of a pixel (and hence an object) may be varied from itssource characteristics before it is presented in a layer of amulti-layer display, the appearance of depth or shadowing may be addedor varied. This provides an ability to alter the apparent significanceof an object to a user and increase the likelihood it will be noticed orselected by a user or machine-implemented process.

As examples, a user may be enabled to interact with objects or userinterface elements displayed in one or more layers of a screen displayon a monitor by using gestures, their positioning, their orientation, ortheir motion that is detected and captured by a video camera. Theinteractions may be used to control a computing device or presentationof an experience (e.g., a game, lecture, etc.). This may be accomplishedwithout the user’s direct contact with the computing device. Further, auser may introduce an object from their environment into a layer of thedisplay (via a camera capturing an image of the object) and theninteract with it as part of what is displayed on the screen or monitor.Embodiments may provide these functions and capabilities throughreal-time tracking and recognition of a user’s hand and fingers, andpresentation of that information as a layer of a display. Recognition ofwhen a user’s finger overlays a user interface element in a differentlayer of a display may be followed by selecting or activating the userinterface element.

As disclosed herein, to assist in correctly and unambiguouslydetermining a user’s intent when they select or interact with adisplayed object, a system may incorporate logic to identify an existingor potential overlap or occlusion of one object by another. A potentialoverlap or occlusion may occur when an object in one layer is beingmoved or may appear to a user to move due to the user’s motion, gaze,orientation, etc. In response, the system may prevent or reduce theactual or potential obscuration of one object by another byautomatically varying a position, orientation, size, shape,transparency, or resolution of an object. As part of this processing (orindependently of it), embodiments may alter pixel characteristics toenhance the ability of a user or machine to select a user interfaceelement, object, text box, or other feature.

As mentioned, in some embodiments, one or more of the disclosedfunctions and capabilities may be used to enable a form of “touchlesscomputing” wherein a user’s gaze, gestures, movements, position,orientation, or other characteristics observed by a camera are used asthe basis for selecting objects and executing processes on a computingdevice. Further, because the opacity of pixels in different layers maybe adjusted to improve viewing and recognizability of objects in one ormore of the displayed layers of a multi-layer display, this may also beconsidered a form of “transparent computing.” This type of computingexperience may include the presentation of a plurality of objects orcontent distributed among multiple layers. Transparent computing mayalso include the ability of a user to interact with a 3-dimensionalenvironment, for example by introducing an image of an object into alayer and interacting with the object as part of performing a task.

In some embodiments, the presence or absence of an object, person, or anattribute of a person or location (e.g., wallpaper, a poster, a scene, awell-known structure, etc.) may be determined by image processing oraccessing a video buffer, and that information used as part of anauthentication, access control, or other security-related function. In arelated example, a camera connected to one computing device may detectand/or identify an object displayed on a screen of another computingdevice, and that detection and/or identification may be used as part ofan authentication or access control process.

In some embodiments, image processing techniques may be used todetermine the separation or orientation between a person or object and acamera. This separation or orientation may be used as part of a logicalprocess to decide whether to initiate an authentication or othersecurity process. For example, as a person or object nears a displayscreen, the distance may be determined and compared to a thresholdvalue. The result of the comparison may then be used to initiate arequest for authentication.

Sharing of the same (or substantially the same) screen space by multiplelayers of a multi-layer display effectively introduces another dimensioninto the processing of video data. Conventionally, video processing isbased on a two-dimensional array of pixels, expressed in (x, y)coordinates. In contrast, the present disclosure introduces an approachfor use with a three-dimensional array of pixels, expressed in (x, y, z)coordinates, where the color, brightness, contrast, transparency,opacity, and/or resolution of each pixel may be individually adjusted inreal-time.

FIG. 3 is a diagram illustrating how a pixel in a layer of a multiplelayer display may be “defined” by or associated with a three-dimensionalcoordinate system. As shown in the figure, a pixel in a first layer mayhave coordinates (77, 256, 0) in a (x, y, z) coordinate system, while apixel in a second layer may have coordinates (77, 256, -567) in (x, y,z) coordinates. In such an example, the pixel in the first or top layermay obscure the pixel in the lower layer. However, by adjusting theappearance of the two pixels, it is possible to enable a user to viewthe lower-level pixel through the top layer pixel without removingeither pixel from the display.

FIG. 4 is a diagram illustrating an example of a display screen 402(such as the display of a computing device) on which are generated anddisplayed multiple overlapping layers 404 and 406, in accordance withsome embodiments. As a non-limiting example, layer 404 may be generatedfrom content acquired from a source, and layer 406 may represent a feedfrom a video web camera. Each of layers 404 and 406 may contain one ormore visual objects 408, where objects 408 may include one or more ofdocument or video thumbnails, a live speaker’s webcam video, recordedimages/videos, or web content, as examples.

As mentioned, in some embodiments, a sub-system or process may be usedto detect an actual or possible situation in which an object in onelayer overlaps or obscures an object in another layer, and in responseand minimize or eliminate the overlap or obscuration. The overlap orobscuration may occur due to an initial arrangement of objects indifferent layers, and/or may result from a change in position of anobject, a change in how a user views a display (e.g., from directly infront or from the side, or with their eyes facing the display or withtheir head turned), or other factor. In some embodiments, the sub-systemfunctions to enable visual objects to automatically and dynamically bemade “aware” of each other to avoid potential overlaps or obscurations.In this regard, embodiments of the disclosure introduce techniques toalter the appearance of objects within and across displayed layers,either automatically or by human control.

In some embodiments, the automatic object appearance adjustments arebased on determining the type and context of objects. Here the contextincludes the objects themselves (i.e., is the object a human hand makinga specific gesture) and/or the ambient context, such as time-of-day,location, or changes in environmental conditions within a video feed.Such environmental changes may include a light being turned on so thatobjects are lighter, or reflections become more prominent. Objectchanges may also include human-induced appearance adjustments based on aone or more of real-time tracking of a human gaze direction (i.e., wherein a layer a person is looking), or the human’s position and movementrelative to the objects within the layers.

As described, the disclosed approach and techniques may be used todetect and group pixels into “objects” and to detect, measure, and trackthe movement, direction, orientation, rotation, and velocity of these“objects” in real time. A trained convolutional neural network (CNN) maybe used to detect and classify objects within images and/or live videostreams. A sub-system or process may be used to detect an actual orpossible situation in which an object in one layer overlaps or obscuresan object in another layer (or may appear to), and in response minimizeor eliminate the actual or potential overlap or obscuration.

In one embodiment, the sub-system may operate to detect objects andassign a geometric boundary around objects in a three-dimensional space,and monitor events to determine when one or more objects haveoverlapping geometry coordinates. Further, “intelligent” algorithms ordecision processes may be used to implement predictive approaches (suchas, but not limited to stochastic, Bayesian and/or regressiontechniques) to predict a likelihood of two objects overlapping orappearing to overlap.

FIG. 5 is a diagram illustrating an example of multiple video sourcesbeing combined or merged to form a multi-layer display; in this example,Layer 0 is a capture of a computer desktop, Layer -1 is a capture of awebcam video feed, and Layer 1 is a capture of a live video streamingfeed. Each layer is digitally analyzed, frame-by-frame, pixel-by-pixel,in real-time and an optimized combination of the aggregate pixels isdigitally merged into a composite video stream. Note that this is oneexample of a situation where composite video graphics may interactand/or obscure each other in unintended ways, thereby limiting effectivecommunication and interactions with the displayed elements of thesources.

As mentioned, in some embodiments, a first step in implementing thedisclosed object processing is the computer detection of one or moreobjects or elements within each layer of a multi-layer display (i.e., inone or more of the composite graphics feeds used to produce themulti-layer display). One or more techniques from the field of ComputerVision may be used to detect and identify/classify an object using aconvolutional neural network (CNN), a trained machine learning (ML)model, and/or parsing of digital metadata embedded within video feeds.One or more CNNs or models may be trained to detect and identify visualelements such as edges, corners, shapes, numbers, etc. More complex(deeper) models may be trained to detect specific visual elements, suchas hands, eyes, birds, etc.

Once the objects (e.g., images or text) in each layer of a multi-layeror composite feed have been detected, an embodiment of the method maydetermine the object type, category, or classification using a suitabletechnique. The determined type, category, or classification may be oneof, but is not limited to “alphabetic text,” “geometric shape,” (e.g.,square, circle, or oval) “window of an open, executing application,”“human hand, face, or body,” as examples. Determination of the type,category, or classification may be performed by a trained machinelearning model, a rule-set, inspection of a log file or operating systemstatus, or other suitable technique that is applicable to the type orcategory of object.

In some embodiments, the following techniques may be used to detectand/or identify objects, and as part of determining how an object is tobe processed:

-   Convolutional Neural Network (CNN) based models acting as    classifiers - CNN and similar neural network-based models can be    trained to identify types of visual objects, such as shapes    (rectangles, circles/ovals, triangles, etc.), or common shape types    (e.g., dog, cat, plant);    -   As examples, in one embodiment, the types of objects that a CNN        has been trained to identify/classify include humans (including        a full body pose, just a face, just a hand or pair of hands,        just eyes, or a combination thereof), text appearing on a layer,        and primitive shapes (including understanding the context in        which an identified shape is used (e.g., is a rectangle a web        advertisement, or a data table in a power point presentation?));-   Geometric-based models to determine if an object in one layer is    obscuring an object in another layer - for example, a geometric    model may place a 2-dimensional or a 3-dimensional boundary area    around an object (e.g., a spherical shape around a basketball, an    ovaloid shape around a football, a rectangular shape around a web    advertisement, etc.) and use those boundaries to determine the    possible obscuration of one object by another;-   Geometric-based models to determine a likelihood of an actual or    apparent overlap, occlusion, or obscuration between an object in one    layer and an object in another layer - this may occur as a result of    the motion of one object relative to another, and/or a change in the    apparent relative positions of objects in different layers due to a    movement or selection of an object, or the perspective of a viewer.    For example, in some embodiments, the defined boundary areas may be    monitored to determine a possible intersection or overlap of the    boundary area of one object with a boundary area of one or more    other objects in the same or different layers;    -   Models based on the behavior of objects in motion (such as by        incorporation of principles/law of physics) may be applied in        some cases;        -   for example, to cause objects that may overlap to react or            “repel” each other in a realistic manner (e.g., objects with            greater mass may intersect with greater momentum and energy            and result in the objects appearing to be repelled with            greater momentum in the resulting directions);    -   Models may also be developed to avoid overlap by automatically        moving objects “around” each other through adjustments in        position, orientation, motion, etc.;    -   As suggested, even if not overlapping when viewed directly, two        objects may appear to overlap or have the potential to overlap        due to the viewer’s perspective - thus, as a viewer moves or        changes their gaze, two objects that were recognizable may        become less so; and-   NLP (natural language processing) and NLU (natural language    understanding) models may be used to recognize and interpret text;    -   For example, a combination of optical character recognition        (OCR) and NLP may be used to detect what language or words a        group of text characters represent, and this used with NLU to        infer the intended meaning of words and reduce possible        ambiguities.

In some embodiments, the method then determines (e.g., based on arule-set, formula, or trained machine learning model) how to furtherprocess the identified objects, typically based on the object type orcategory. For example, if a human profile is detected, then the methodmay extract the human profile from the remainder of a web cam video feed(i.e., background subtraction) and replicate it digitally. A reason forthe digital replication might be to re-insert one or more of thereplications back into the composite feed, with an adjustment to color,brightness, contrast, size, resolution, etc.

In one embodiment, a set of rules for determining the subsequentprocessing of an object or text may be based on a list of object typesof interest, where the list may be manually curated by a human.Compilation of such a list may be followed by using a machine-learningalgorithm to create a trained model to automatically recognize each ofthe object types of interest in a video frame buffer or image generatedfrom a video stream. Such models include but are not limited to the useof convolutional neural networks as classifiers, for example.

As an example, in one embodiment, an initial list of object types ofinterest might include humans (e.g., full body, or specific body partssuch as eyes, head, nose, etc.), numbers, text, primitive shapes (e.g.,squares, circles, etc. ), or mobile phones. For each object type orcategory, a model may be trained to automatically recognize the objecttype or class under a variety of image or video conditions (e.g.,low/high contrast, low/high quality, low/high background lighting,etc.).

For each object that is of a type of interest, one or more rules may beapplied based on the type. For example, if the object type is “text,” anembodiment may implement a rule that acts to translate the visual textinto a data equivalent of the text, followed by interpreting the text(using a process or processes such as object character recognition(OCR), natural language processing (NLP), or natural languageunderstanding (NLU)). This processing may be followed by applying asubsequent rule to block or blur the text to protect data privacy or toautomatically translate the text into an alternate language and presentthat to a viewer.

As another example, adjustments or additional processing of image orpixel characteristics may be performed to address the previouslydescribed “limitation of view.” An example is to modify a human profileso that it is no longer “lost” in the “visual busyness” of a compositegraphic but instead is more clearly distinguishable in the “scene”. Thisis expected to result in improving communication and understanding ofthe displayed content.

Similarly, the disclosed method may detect that a visual object isalphabetic text and may re-introduce one or more digital copies of thetext back into the composite display, with adjustments for color,brightness, contrast, size, or position so that the text is more readilyrecognized by viewers of the display. This may provide a solution for ascenario where light from a lamp in the background of a web feed makesit difficult for viewers to see content presented on a blackboard orsurface.

In some embodiments, the processing disclosed may change or alter thevisual attributes of one or more objects (such as color, brightness,contrast, or resolution) but may also automatically move the position,size, and/or orientation of one or more of the objects. As an example,in a situation where a detected object in one layer is fully orpartially occluding one or more objects on a presentation slide inanother layer, the method may automatically change the position, size,and/or orientation of one or more of the objects so that they no longerocclude one another, thereby improving the communication andeffectiveness of the presented content. In some embodiments, the methodmay simultaneously and dynamically adjust both visual attributes (e.g.,color, brightness, contrast, or resolution) and position, size, ororientation of an object or objects in one or more layers.

As described, embodiments are directed to systems, devices, and methodsfor a multi-layer display that prevents or reduces the apparent orpotential overlap or obscuration (either partially or completely) of anobject in one layer by an object in another layer of the display. Theadjustments may be one-time or continuous and ongoing, and the method’stechniques may determine an adjustment approach that will improverecognizability of the objects, and thereby maximize effectivecommunication and understanding of the content.

Further, in addition to utilizing techniques that automatically adjust aset of objects or other forms of content, embodiments allow humansand/or computing devices to determine when and how to adjust theappearance, position, attributes, orientation, or other characteristicsof objects. As examples, the color and contrast of text may be altereddynamically to make it stand out more clearly from the background, orthe position of a video-playback element may be moved to prevent it fromoverlapping or being overlapped by other parts of an aggregate video.

In some embodiments, the disclosed method may adjust the opacity (orrelative transparency) of one or more pixels, objects, or regions ofeach layer (at a single or multiple pixel granularity) to improve therecognizability of an object and thereby maximize effectivecommunication and understanding of content. In the situation where adetected visual object is alphabetic text, the method’s techniques mayapply object character recognition (OCR) to dynamically translate the“raster” representation of the text to binary encoded representations(e.g., ASCII byte values).

Further, in the case where objects are OCR-translated alphabetic text,the method may automatically translate the text into one or moredifferent human languages so that each viewer views the text in alanguage they select. In this example, the OCR-translated alphabetictext may be processed by a language translation library (such asGoogle’s Translate API or similar) into the language that each viewerhas selected in their settings or profile.

A novel aspect of the disclosure is replacing the original text in videoimages or streams with one or more different languages (specific to eachparticipant) in real-time. Conventional approaches are not believedcapable of translating text into multiple languages while simultaneouslyhiding/obscuring the original text from sight. This capability enablesconverting existing text in a live video or presentation into multiplelanguages in real-time so that each participant’s display presents thecontent in a language of their choosing.

In a situation in which an object is replicated, the disclosed methodmay dynamically remove the original source object from the compositegraphic. As an example, in the case where digital replicas are beingdynamically moved (with regards to position, size, and/or orientation),if the original object were to remain in place, then the desired neteffect of movement may not be achieved because viewers would still seethe original version of the object.

With respect to providing a solution to the previously described“limitation of intent” problem, the method’s techniques may dynamicallycontrol the ability to select an object or user interface element in oneor more of the composite layers. As one example of this capability, in ascenario of two composite video feed layers, the method may make one ofthe layers ignore mouse-clicks for one or more of that layer’s pixels orobjects. In the example of a presenter intending to select an object ona slide that happens to be partially occluded by another object in adifferent layer, the method may make the pixels that comprise one objectignore a mouse click (or other form of selection), allowing the click“action” to flow through that layer to the intended layer and select thedesired object.

Similarly to the logic used to determine how to process a specific typeof object, the logical processing that determines whether to enable theselectability of an object may be implemented in the form of a trainedmodel or rule-based system. In one example, a rule-based system maystart with basic use-cases, such as if a detected object is associatedwith a well-understood (or unambiguous) purpose. An example would be a“play” button or a shopping cart button, in which case the system mayimplement logic that makes those objects selectable/clickable,regardless of which layer they reside in.

FIG. 6 is a flow chart or flow diagram illustrating a method, process,operation, or set of functions that may be used in implementing anembodiment of the disclosure. In some embodiments, the set of steps orstages illustrated in the figure may be performed by execution of a setof computer-executable instructions by one or more electronicprocessors. The electronic processor(s) may be part of a system, device,platform, server, etc. Each step or stage illustrated may be performedby execution of instructions by one or more of the processors.

In some embodiments, a set of trained models or rule-sets are providedto an end-user and may be included as part of an application oroperating system function they install on their client device. In oneembodiment, the formation of the display comprising the merged sourcesof content is performed by the application or function, as is theprocessing of pixels and objects (apart from the training orconstruction of the models).

As shown in the figure, the method, process, operation, or set offunctions 600 may include, but is not limited to or required to include:

-   Receiving a Video Feed From a Plurality of Sources (as suggested by    step or stage 602);    -   As described, the sources may be one or more of a camera feed,        computer desktop, document presented in an operating system        window, or streaming video as non-limiting examples;-   Combine/Merge the Plurality of Feeds into a Composite Feed to    Produce a Multi-Layer Display (as suggested by step or stage 604);    -   A composite feed is formed from combining the pixels from each        source of content into a multi-layer display;        -   In some embodiments, each source may be used to generate one            layer of a multi-layer display;    -   The source content processing operations and logic may be        performed to cause pixels from different sources to be merged        into a single display in which each of the sources appear as a        different layer or region, or in which objects from different        sources are combined into a layer;-   Detect One or More Objects in Each Layer of a Multi-Layer Display    (as suggested by step or stage 606);    -   This operation or function may be performed by a suitable image        processing technique, such as those used for computer vision or        object recognition (e.g., a convolutional neural network, box or        edge detection, etc.);    -   The computer vision or object recognition technique may be        applied to the generated display, to a video frame buffer, or to        other aspect of the system;-   For Each Object Detected, Determine or Identify an Object Type or    Classification (as suggested by step or stage 608);    -   As described, this may be performed by a trained model operating        to identify/classify an object, a rule-set that operates to        define characteristics of an object, etc.;-   Determine and Perform Desired Processing of Each Detected Object    Based on its Object Type or Classification (as suggested by step or    stage 610);    -   A set of rules or instructions may be accessed that defines how        each object type or classification should be treated;        -   The rules or instructions may be generated by a user, based            on a set of guiding principles for improving object            recognizability or more effective communication, a fuzzy            logic system, a system that learns from a set of images and            instructions for how to improve the images, or other            suitable technique;-   Determine if an Object Should be Made Selectable or Not Selectable    by a User or Machine and Set Accordingly (as suggested by step or    stage 612);    -   This may be part of the accessed set of rules or instructions        and/or be determined based on which objects or user interface        elements are visible after performing the processing of each        object;        -   This determination and action may be optional, or delayed            and performed at a later time in the processing flow;-   Determine if an Object in a First Layer Partially or Completely    Overlaps, Occludes, or Obscures an Object in a Second Layer - If    Yes, Adjust Position and/or Orientation of One of the Two Objects    (as suggested by step or stage 614);    -   This may be part of the accessed set of rules or instructions        and/or be determined based on which objects or user interface        elements are partially or completely obscured after performing        the processing of each object; and-   Determine if an Object in a First Layer is Likely to Appear to a    Viewer to Overlap, Occlude, or Obscure an Object in Another Layer -    If Yes, Adjust Position and/or Orientation of One of the Two Objects    (as suggested by step or stage 616);    -   This may be part of the accessed set of rules or instructions        and/or be determined based on which objects or user interface        elements are likely to appear to overlap after performing the        processing of each object;    -   As described, this apparent overlap, occlusion, or obscuration        may result from motion of an object, execution of an operation        by a device, or the viewer’s position, orientation, or gaze, as        examples;-   One or more of steps or stages 606 through 616 may be performed    continuously on each frame of a video or each set of content being    merged into a multi-layer display.

FIG. 7 is a diagram illustrating elements or components that may bepresent in a computer device, server, or system 700 configured toimplement a method, process, function, or operation in accordance withsome embodiments. As noted, in some embodiments, the described systemand methods may be implemented in the form of an apparatus that includesa processing element and a set of executable instructions. Theexecutable instructions may be part of a software application andarranged into a software architecture.

In general, an embodiment of the invention may be implemented using aset of software instructions that are designed to be executed by asuitably programmed processing element (such as a GPU, TPU, CPU,microprocessor, processor, controller, computing device, etc.). In acomplex application or system such instructions are typically arrangedinto “modules” with each such module typically performing a specifictask, process, function, or operation. The entire set of modules may becontrolled or coordinated in their operation by an operating system (OS)or other form of organizational platform.

The application modules and/or sub-modules may include any suitablecomputer-executable code or set of instructions (e.g., as would beexecuted by a suitably programmed processor, microprocessor, or CPU),such as computer-executable code corresponding to a programminglanguage. For example, programming language source code may be compiledinto computer-executable code. Alternatively, or in addition, theprogramming language may be an interpreted programming language such asa scripting language.

As shown in FIG. 7 , system 700 may represent a server or other form ofcomputing or data processing device. Modules 702 each contain a set ofexecutable instructions, where when the set of instructions is executedby a suitable electronic processor (such as that indicated in the figureby “Physical Processor(s) 730”), system (or server or device) 700operates to perform a specific process, operation, function, or method.Modules 702 may contain one or more sets of instructions for performinga method or function described with reference to the Figures, and thedescriptions of the functions and operations provided in thespecification. These modules may include those illustrated but may alsoinclude a greater number or fewer number than those illustrated.Further, the modules and the set of computer-executable instructionsthat are contained in the modules may be executed (in whole or in part)by the same processor or by more than a single processor.

Modules 702 are stored in a memory 720, which typically includes anOperating System module 704 that contains instructions used (among otherfunctions) to access and control the execution of the instructionscontained in other modules. The modules 702 in memory 720 are accessedfor purposes of transferring data and executing instructions by use of a“bus” or communications line 719, which also serves to permitprocessor(s) 730 to communicate with the modules for purposes ofaccessing and executing a set of instructions. Bus or communicationsline 719 also permits processor(s) 730 to interact with other elementsof system 700, such as input or output devices 722, communicationselements 724 for exchanging data and information with devices externalto system 700, and additional memory devices 726.

Each application module or sub-module may correspond to a specificfunction, method, process, or operation that is implemented by themodule or sub-module. Each module or sub-module may contain a set ofcomputer-executable instructions that when executed by a programmedprocessor or processors cause the processor or processors (or a deviceor devices in which they are contained) to perform the specificfunction, method, process, or operation. Such function, method, process,or operation may include those used to implement one or more aspects ofthe disclosed system and methods, such as for:

-   Receiving a Video Feed From a Plurality of Sources (module 706);-   Combine/Merge the Plurality of Feeds into a Composite Feed to    Produce a Multi-Layer Display (module 708);-   Detect One or More Objects in Each Layer of a Multi-Layer Display    (module 710);-   For Each Object Detected, Determine an Object Type or Classification    (module 712);-   Determine and Perform Desired Processing of Each Detected Object    Based on its Object Type or Classification (module 714);-   Determine if an Object Should be Made Selectable or Not Selectable    by a User or Machine and Set Accordingly (module 716);-   Determine if an Object in a First Layer Partially or Completely    Overlaps, Occludes, or Obscures an Object in a Second Layer - If    Yes, Adjust Position and/or Orientation of One of the Two Objects    (module 717); and-   Determine if an Object in a First Layer is Likely to Appear to a    Viewer to Overlap, Occlude, or Obscure an Object in Another Layer -    If Yes, Adjust Position and/or Orientation of One of the Two Objects    (module 718).

As described, one or more of the processing steps or stages may beperformed continuously on each frame of a video, each image, or each setof content being merged into a multi-layer display.

As mentioned, each module may contain instructions which when executedby a programmed processor cause an apparatus (such as a server or clientdevice) to perform the specific function or functions. The apparatus maybe one or both of a client device or a remote server or platform.Therefore, a module may contain instructions that are performed (inwhole or in part) by the client device, the server or platform, or both.

As described, embodiments can adjust, modify, or alter both thecharacteristics of a pixel (e.g., color, brightness, opacity,resolution, or shadowing) and the characteristics of a group of pixelsor an object (e.g., position/location, velocity of movement,orientation, or rotation).

There are multiple contexts or use cases in which an embodiment of thedisclosure may be used to provide enhanced and more effective display ofobjects and user interface elements, improve the recognizability ofobjects, and thereby improve communication and the understanding ofcontent. As non-limiting examples:

-   Adjustment and placement of objects in a layer or layers to    emphasize an object with a border, color, depth, or perspective;-   Recognition of text in a layer and conversion or translation of that    text into a different language, with the converted text presented in    the same or a different layer;-   Recognition of text in a layer and processing of the text using a    natural language understanding (NLU) model to determine the meaning    of the text, and in response to perform or cause to be executed an    operation or function, where examples of such operations or    functions include but are not limited to:    -   Initiation of a payment transaction;    -   Activation of a link, resulting in accessing a data storage        location or website;    -   Retrieval of data or a document from a specified storage        location (such as a file, folder, etc.);    -   Transfer of data or a document to a desired location or user;    -   Launching of a specific application; or    -   Generating a message.-   Recognition of an object in a layer, processing of the object using    a model (such as a CNN based classifier) to determine an operation    or function associated with the object, and if desired, executing    that operation or function;    -   For example, a presenter may show visual thumbnails of slides        they plan to present and have these “float” in the same view as        the video feed from their webcam. The disclosed system may        automatically adjust the position of the thumbnails as the        presenter’s webcam image moves, ensuring that the presenter is        not obscured by one of the thumbnails;    -   In another example, a teacher may be presenting on a blackboard        while simultaneously overlapping the image from a webcam video.        An embodiment may continuously monitor the ambient        color/brightness of the overlapping video layers and adjust the        color of the chalk on the blackboard, making it either lighter        or darker, to ensure it is able to be viewed more clearly from        the rest of the content;-   Additional non-limiting examples of contexts, environments, or use    cases that may benefit from the disclosed techniques for generating    a multi-layer display may include:    -   As described, in some embodiments, when generating the composite        display of multiple layers, a viewer’s perspective can be        considered. For example, when there is a virtual distance        between layers and a viewer moves, the objects can be        repositioned, and attributes can be adjusted to accommodate the        reality that the viewer is looking at the screen “from the side”        instead of from the center of the screen;    -   In some situations, two viewers may be looking at the same        object (e.g., a document) and one may want to see the document        more, and the other may want to see the other person’s face. For        example, in a banking transaction, the customer may want to see        the form rather than the banker’s face, but the banker may want        to see the customer’s face rather than the form;    -   The disclosed techniques may be used to generate displays on        devices and systems other than conventional computers or mobile        phones. For example, a multi-layer display may be generated on a        dashboard screen of an automobile, to display both operational        information about a vehicle (such as speed, engine RPM, the        status of lights) in addition to a feed generated from a camera        inside the vehicle or in another location;    -   In one embodiment, the disclosed techniques may be used to        generate a multi-layer display on a transparent section of        glass, for example, on the glass window of a traveling train        where at night, the font may be lit differently from what is        displayed during the day;    -   In some embodiments, the disclosed techniques could be used on a        glass wall of a smart home or office, where the content        displayed would be adjusted based on what is happening behind        the glass. In this example, someone looking through the glass        wall (which has displayed content) would be able to read        displayed text, even if someone with a shirt of the same color        as the text passes by, because the system can change the text to        a contrasting color at that time;    -   In a heads-up display such as that in some vehicles, when it is        bright outside and a driver is behind a white vehicle, they may        not be able to clearly see the heads-up display because its        bright white. In this situation, the disclosed techniques may be        used to change the heads-up display color to a darker color when        the driver is behind a white car;        -   Similarly, when using smart glasses, if the text is black,            and the wearer looks up at the sky to see the stars at            night, they may not be able to see the text - in this            situation the disclosed techniques can vary the text shown            in one layer so that it is more readily seen by the wearer.-   Further, as described, in some embodiments one or more signals may    be generated that cause a change in the position, movement, or    characteristics of a pixel or object, or the appearance of a layer;    -   these control signals may be generated in response to a detected        or identified object or to information gathered from a user’s        camera, as examples;        -   for example, knowing where a user is looking can result in            automatically selecting an object the user is focusing their            view upon, by including the viewer’s gaze position as part            of the system inputs;        -   gaze detection may also be used to help calibrate the            system - knowing which object a user is looking at and where            it exists on one of the layers of a multi-layer display may            provide additional information to the rules, models, or            algorithms processing object or pixel data;        -   gaze detection may also be used to alter lenticular or            parallax effects on a displayed object or objects;            -   for example, if a user moves and/or looks to the                left/right, the display may be dynamically adjusted so                an object or objects appear to the user as if they are                “peering into” a 3-dimensional space;        -   a user’s position may be used to “bound” the location of the            user as one of the inputs to an object overlap detection and            reduction process; or        -   a user’s movement may be used in the same way to assist in            detecting and avoiding overlap between displayed objects.

This disclosure includes the following embodiments and clauses:

A method of generating a display on a device, comprising: combiningcontent from a plurality of sources into a display, wherein the contentfrom each of the plurality of sources is presented as a layer of thedisplay, and further, wherein each layer of the display is ofsubstantially the same dimensions; detecting one or more objects in eachlayer of the generated display; and for one or more of the detectedobjects determining an object type or classification; determining if theobject is overlapping or obscuring an object in a different layer of thegenerated display; determining if the object will appear to a viewer asif it will overlap or obscure an object in a different layer of thegenerated display as a result of motion, orientation, or gaze of theviewer; and based on the object’s type or classification, adetermination that the object is overlapping or obscuring an object in adifferent layer of the generated display, or a determination that theobject will appear to a viewer as if it will overlap or obscure anobject in a different layer, modifying a characteristic of the objectbased on a rule or trained model.

In an embodiment, the characteristic of the object is one or more of ashape, a color, a contrast, a transparency, an opacity, a position, aresolution, or an orientation.

In an embodiment, determining an object type or classification furthercomprises providing an image of the object to a trained model operatingto output a classification of the object.

In an embodiment, the classification of the object is one of text, ahuman, an animal, or a shape of the object.

In an embodiment, the method further comprises determining if an objectshould be made selectable or not selectable, and in response settingthat characteristic accordingly.

In an embodiment, the sources comprise one or more of a video camera, anapplication executing on a user’s device, or a remote server storingcontent.

In an embodiment, modifying a characteristic of the object based on arule or trained model further comprises accessing a rule, a set ofrules, or a trained model from a user’s device that determine how toprocess the object.

In an embodiment, the method further comprises erforming the steps ofdetecting one or more objects, determining an object type orclassification, determining that the object is overlapping or obscuringan object in a different layer of the generated display, or determiningthat the object will appear to a viewer as if it will overlap or obscurean object in a different layer, and modifying a characteristic of theobject based on a rule or trained model continuously as video content orimages are received from the sources.

In an embodiment, if an object is determined to be text expressed in afirst language, then the method further comprises: translating the textinto a second language; removing the text in the first language; andinserting the text in the second language into the generated display.

A system for generating a display on a device, comprising: one or moreelectronic processors configured to execute a set of computer-executableinstructions; one or more non-transitory electronic data storage mediacontaining the set of computer-executable instructions, wherein whenexecuted, the instructions cause the one or more electronic processorsto combine content from a plurality of sources into a display, whereinthe content from each of the plurality of sources is presented as alayer of the display, and further, wherein each layer of the display isof substantially the same dimensions; detect one or more objects in eachlayer of the generated display; and for one or more of the detectedobjects; determine an object type or classification; determine if theobject is overlapping or obscuring an object in a different layer of thegenerated display; determine if the object will appear to a viewer as ifit will overlap or obscure an object in a different layer of thegenerated display as a result of motion, orientation, or gaze of theviewer; and based on the object’s type or classification, adetermination that the object is overlapping or obscuring an object in adifferent layer of the generated display, or a determination that theobject will appear to a viewer as if it will overlap or obscure anobject in a different layer, modifying a characteristic of the objectbased on a rule or trained model.

One or more non-transitory computer-readable media comprising a set ofcomputer-executable instructions that when executed by one or moreprogrammed electronic processors, cause the one or more programmedelectronic processors to combine content from a plurality of sourcesinto a display, wherein the content from each of the plurality ofsources is presented as a layer of the display, and further, whereineach layer of the display is of substantially the same dimensions;detect one or more objects in each layer of the generated display; andfor one or more of the detected objects determine an object type orclassification; determine if the object is overlapping or obscuring anobject in a different layer of the generated display; determine if theobject will appear to a viewer as if it will overlap or obscure anobject in a different layer of the generated display as a result ofmotion, orientation, or gaze of the viewer; and based on the object’stype or classification, a determination that the object is overlappingor obscuring an object in a different layer of the generated display, ora determination that the object will appear to a viewer as if it willoverlap or obscure an object in a different layer, modifying acharacteristic of the object based on a rule or trained model.

In an embodiment, the characteristic of the object is one or more of ashape, a color, a contrast, a transparency, an opacity, a position, aresolution, or an orientation.

In an embodiment, determining an object type or classification furthercomprises providing an image of the object to a trained model operatingto output a classification of the object.

In an embodiment, the classification of the object is one of text, ahuman, an animal, or a shape of the object.

In an embodiment, the set of computer-executable instructions, whenexecuted by the one or more programmed electronic processors, cause theone or more programmed electronic processors to determine if an objectshould be made selectable or not selectable, and in response settingthat characteristic accordingly.

In an embodiment, the sources comprise one or more of a video camera, anapplication executing on a user’s device, or a remote server storingcontent.

In an embodiment, modifying a characteristic of the object based on arule or trained model further comprises accessing a rule, a set ofrules, or a trained model from a user’s device that determine how toprocess the object.

In an embodiment, the set of computer-executable instructions, whenexecuted by the one or more programmed electronic processors, cause theone or more programmed electronic processors to perform the steps ofdetecting one or more objects, determining an object type orclassification, determining that the object is overlapping or obscuringan object in a different layer of the generated display, or determiningthat the object will appear to a viewer as if it will overlap or obscurean object in a different layer, and modifying a characteristic of theobject based on a rule or trained model continuously as video content orimages are received from the sources.

In an embodiment, if an object is determined to be text expressed in afirst language, then the set of computer-executable instructions, whenexecuted by the one or more programmed electronic processors, cause theone or more programmed electronic processors to: translate the text intoa second language; remove the text in the first language; and insert thetext in the second language into the generated display.

It should be understood that the present invention as described abovecan be implemented in the form of control logic using computer softwarein a modular or integrated manner. Based on the disclosure and teachingsprovided herein, a person of ordinary skill in the art will know andappreciate other ways and/or methods to implement the present inventionusing hardware and a combination of hardware and software.

Machine learning (ML) is being used more and more to enable the analysisof data and assist in making decisions in multiple industries. Tobenefit from using machine learning, a machine learning algorithm isapplied to a set of training data and labels to generate a “model” whichrepresents what the application of the algorithm has “learned” from thetraining data. Each element (or instances or example, in the form of oneor more parameters, variables, characteristics or “features”) of the setof training data is associated with a label or annotation that defineshow the element should be classified by the trained model. A machinelearning model in the form of a neural network is a set of layers ofconnected neurons that operate to make a decision (such as aclassification) regarding a sample of input data. When trained (i.e.,the weights connecting neurons have converged and become stable orwithin an acceptable amount of variation), the model will operate on anew element of input data to generate the correct label orclassification as an output.

In some embodiments, certain of the methods, models, or functionsdescribed herein may be embodied in the form of a trained neuralnetwork, where the network is implemented by the execution of a set ofcomputer-executable instructions or representation of a data structure.The instructions may be stored in (or on) a non-transitorycomputer-readable medium and executed by a programmed processor orprocessing element. The set of instructions may be conveyed to a userthrough a transfer of instructions or an application that executes a setof instructions (such as over a network, e.g., the Internet). The set ofinstructions or an application may be utilized by an end-user throughaccess to a SaaS platform or a service provided through such a platform.A trained neural network, trained machine learning model, or any otherform of decision or classification process may be used to implement oneor more of the methods, functions, processes, or operations describedherein. Note that a neural network or deep learning model may becharacterized in the form of a data structure in which are stored datarepresenting a set of layers containing nodes, and connections betweennodes in different layers are created (or formed) that operate on aninput to provide a decision or value as an output.

In general terms, a neural network may be viewed as a system ofinterconnected artificial “neurons” or nodes that exchange messagesbetween each other. The connections have numeric weights that are“tuned” during a training process, so that a properly trained networkwill respond correctly when presented with an image or pattern torecognize (for example). In this characterization, the network consistsof multiple layers of feature-detecting “neurons”; each layer hasneurons that respond to different combinations of inputs from theprevious layers. Training of a network is performed using a “labeled”dataset of inputs in a wide assortment of representative input patternsthat are associated with their intended output response. Training usesgeneral-purpose methods to iteratively determine the weights forintermediate and final feature neurons. In terms of a computationalmodel, each neuron calculates the dot product of inputs and weights,adds the bias, and applies a non-linear trigger or activation function(for example, using a sigmoid response function).

Any of the software components, processes or functions described in thisapplication may be implemented as software code to be executed by aprocessor using any suitable computer language such as Python, Java,JavaScript, C, C++, or Perl using procedural, functional,object-oriented, or other techniques. The software code may be stored asa series of instructions, or commands in (or on) a non-transitorycomputer-readable medium, such as a random-access memory (RAM), a readonly memory (ROM), a magnetic medium such as a hard-drive or a floppydisk, or an optical medium such as a CD-ROM. In this context, anon-transitory computer-readable medium is almost any medium suitablefor the storage of data or an instruction set aside from a transitorywaveform. Any such computer readable medium may reside on or within asingle computational apparatus and may be present on or within differentcomputational apparatuses within a system or network.

According to one example implementation, the term processing element orprocessor, as used herein, may be a central processing unit (CPU), orconceptualized as a CPU (such as a virtual machine). In this exampleimplementation, the CPU or a device in which the CPU is incorporated maybe coupled, connected, and/or in communication with one or moreperipheral devices, such as display. In another example implementation,the processing element or processor may be incorporated into a mobilecomputing device, such as a smartphone or tablet computer.

The non-transitory computer-readable storage medium referred to hereinmay include a number of physical drive units, such as a redundant arrayof independent disks (RAID), a floppy disk drive, a flash memory, a USBflash drive, an external hard disk drive, thumb drive, pen drive, keydrive, a High-Density Digital Versatile Disc (HD-DV D) optical discdrive, an internal hard disk drive, a Blu-Ray optical disc drive, or aHolographic Digital Data Storage (HDDS) optical disc drive, synchronousdynamic random access memory (SDRAM), or similar devices or other formsof memories based on similar technologies. Such computer-readablestorage media allow the processing element or processor to accesscomputer-executable process steps, application programs and the like,stored on removable and non-removable memory media, to off-load datafrom a device or to upload data to a device. As mentioned, with regardsto the embodiments described herein, a non-transitory computer-readablemedium may include almost any structure, technology or method apart froma transitory waveform or similar medium.

As shown in FIG. 8 , in some embodiments, one or more of the disclosedfunctions and capabilities may be used to enable a volumetric compositeof content-activated layers of Transparent Computing, content-agnosticlayers of Transparent Computing and/or camera-captured layers ofTransparent Computing placed visibly behind 2-dimensional or3-dimensional content displayed on screens, placed in front of2-dimensional or 3-dimensional content displayed on screens, placedinside of 3-dimensional content displayed on screens and/or placedvirtually outside of the display of screens. Users can interact viaTouchless Computing with any layer in a volumetric composite of layersof Transparent Computing wherein a user’s gaze, gestures, movements,position, orientation, or other characteristics observed by a camera areused as the basis for selecting and interacting with objects in anylayer in the volumetric composite of layers of Transparent Computing toexecute processes on computing devices.

In some embodiments, one or more of the disclosed functions andcapabilities may be used to enable users to see a volumetric compositeof layers of Transparent Computing from a 360-degree Optical LenticularPerspective wherein a user’s gaze, gestures, movements, position,orientation, or other characteristics observed by cameras are a basis tocalculate, derive and/or predict the 360-degree Optical LenticularPerspective from which users see the volumetric composite of layers ofTransparent Computing displayed on screens. Further, users can engagewith a 3-dimensional virtual environment displayed on screens consistingof layers of Transparent Computing placed behind the 3-dimensionalvirtual environment displayed on screens, placed in front of a3-dimensional virtual environment displayed on screens, and/or placedinside of the a 3-dimensional virtual environment displayed on screenswherein users can select and interact with objects in any layer ofTransparent Computing to execute processes on computing devices whilelooking at the combination of the 3-dimensional virtual environment andthe volumetric composite of layers of Transparent Computing from anyangle of the 360-degree Optical Lenticular Perspective available tousers.

Certain implementations of the disclosed technology are described hereinwith reference to block diagrams of systems, and/or to flowcharts orflow diagrams of functions, operations, processes, or methods. It willbe understood that one or more blocks of the block diagrams, or one ormore stages or steps of the flowcharts or flow diagrams, andcombinations of blocks in the block diagrams and stages or steps of theflowcharts or flow diagrams, respectively, may be implemented bycomputer-executable program instructions. Note that in some embodiments,one or more of the blocks, or stages or steps may not necessarily needto be performed in the order presented or may not necessarily need to beperformed at all.

These computer-executable program instructions may be loaded onto ageneral-purpose computer, a special purpose computer, a processor, orother programmable data processing apparatus to produce a specificexample of a machine, such that the instructions that are executed bythe computer, processor, or other programmable data processing apparatuscreate means for implementing one or more of the functions, operations,processes, or methods described herein. These computer programinstructions may also be stored in a computer-readable memory that maydirect a computer or other programmable data processing apparatus tofunction in a specific manner, such that the instructions stored in thecomputer-readable memory produce an article of manufacture includinginstruction means that implement one or more of the functions,operations, processes, or methods described herein.

While certain implementations of the disclosed technology have beendescribed in connection with what is presently considered to be the mostpractical and various implementations, it is to be understood that thedisclosed technology is not to be limited to the disclosedimplementations. Instead, the disclosed implementations are intended tocover various modifications and equivalent arrangements included withinthe scope of the appended claims. Although specific terms are employedherein, they are used in a generic and descriptive sense only and notfor purposes of limitation.

This written description uses examples to disclose certainimplementations of the disclosed technology, and to enable any personskilled in the art to practice certain implementations of the disclosedtechnology, including making and using any devices or systems andperforming any incorporated methods. The patentable scope of certainimplementations of the disclosed technology is defined in the claims,and may include other examples that occur to those skilled in the art.Such other examples are intended to be within the scope of the claims ifthey have structural and/or functional elements that do not differ fromthe literal language of the claims, or if they include structural and/orfunctional elements with insubstantial differences from the literallanguage of the claims.

All references, including publications, patent applications, andpatents, cited herein are hereby incorporated by reference to the sameextent as if each reference were individually and specifically indicatedto be incorporated by reference and/or were set forth in its entiretyherein.

The use of the terms “a” and “an” and “the” and similar referents in thespecification and in the following claims are to be construed to coverboth the singular and the plural, unless otherwise indicated herein orclearly contradicted by context. The terms “having,” “including,”“containing” and similar referents in the specification and in thefollowing claims are to be construed as open-ended terms (e.g., meaning“including, but not limited to,”) unless otherwise noted. Recitation ofranges of values herein are merely indented to serve as a shorthandmethod of referring individually to each separate value inclusivelyfalling within the range, unless otherwise indicated herein, and eachseparate value is incorporated into the specification as if it wereindividually recited herein. All methods described herein may beperformed in any suitable order unless otherwise indicated herein orclearly contradicted by context. The use of any and all examples, orexemplary language (e.g., “such as”) provided herein, is intended merelyto better illuminate embodiments of the invention and does not pose alimitation to the scope of the invention unless otherwise claimed. Nolanguage in the specification should be construed as indicating anynon-claimed element as essential to each embodiment of the presentinvention.

As used herein (i.e., the claims, figures, and specification), the term“or” is used inclusively to refer to items in the alternative and incombination.

Different arrangements of the components depicted in the drawings ordescribed above, as well as components and steps not shown or describedare possible. Similarly, some features and sub-combinations are usefuland may be employed without reference to other features andsub-combinations. Embodiments of the invention have been described forillustrative and not restrictive purposes, and alternative embodimentswill become apparent to readers of this patent. Accordingly, the presentinvention is not limited to the embodiments described above or depictedin the drawings, and various embodiments and modifications may be madewithout departing from the scope of the claims below.

What is claimed is:
 1. A method, comprising: detecting an object in a layer of a display; determining an object type or classification of the object; determining whether the object is overlapping or obscuring a different object in a different layer of the display; and based on the object type or classification or a determination that the object is overlapping or obscuring a different object in a different layer of the display, modifying a transparency, a shape, a color, a contrast, an opacity, or a resolution of the object based on a rule or trained model.
 2. The method of claim 1, wherein modifying based on a rule or trained model comprises modifying the transparency, the shape, the color, or the opacity.
 3. The method of claim 1, wherein modifying based on a rule or trained model comprises modifying the transparency.
 4. The method of claim 1, wherein the layer and the different layer are of substantially the same dimensions and associated with different sources.
 5. The method of claim 4, wherein the sources comprise one or more of a video camera, an application executing on a user’s device, or a remote server storing content.
 6. The method of claim 1, further comprising determining whether the object should be made selectable or not selectable, and in response setting the transparency, the shape, the color, the contrast, the opacity, or the resolution accordingly.
 7. The method of claim 1, wherein modifying based on a rule or trained model further comprises accessing a rule, a set of rules, or a trained model from a user’s device that determines how to process the object.
 8. The method of claim 1, wherein when the object is determined to be text expressed in a first language, then the method further comprises: translating the text into a second language; removing the text in the first language; and inserting the text in the second language into the display.
 9. A non-transitory computer-readable storage medium including computer executable instructions, wherein the instructions, when executed by a computer, cause the computer to perform a method, the method comprising: detecting an object in a layer of a display; determining an object type or classification of the object; determining whether the object is overlapping or obscuring a different object in a different layer of the display; and based on the object type or classification or a determination that the object is overlapping or obscuring a different object in a different layer of the display, modifying a transparency, a shape, a color, a contrast, an opacity, or a resolution of the object based on a rule or trained model.
 10. The non-transitory computer-readable storage medium of claim 9, wherein modifying based on a rule or trained model comprises modifying the transparency, the shape, the color, or the opacity.
 11. The non-transitory computer-readable storage medium of claim 9, wherein modifying based on a rule or trained model comprises modifying the transparency.
 12. The non-transitory computer-readable storage medium of claim 9, wherein the layer and the different layer are of substantially the same dimensions and associated with different sources.
 13. The non-transitory computer-readable storage medium of claim 12, wherein the sources comprise one or more of a video camera, an application executing on a user’s device, or a remote server storing content.
 14. The non-transitory computer-readable storage medium of claim 9, further comprising determining whether the object should be made selectable or not selectable, and in response setting the transparency, the shape, the color, the contrast, the opacity, or the resolution accordingly.
 15. The non-transitory computer-readable storage medium of claim 9, wherein modifying based on a rule or trained model further comprises accessing a rule, a set of rules, or a trained model from a user’s device that determines how to process the object.
 16. The non-transitory computer-readable storage medium of claim 9, wherein when the object is determined to be text expressed in a first language, then the method further comprises: translating the text into a second language; removing the text in the first language; and inserting the text in the second language into the display.
 17. An apparatus, comprising: processing circuitry configured to: detect an object in a layer of a display, determine an object type or classification of the object, determine whether the object is overlapping or obscuring a different object in a different layer of the display, and based on the object type or classification or a determination that the object is overlapping or obscuring a different object in a different layer of the display, modify a transparency, a shape, a color, a contrast, an opacity, or a resolution of the object based on a rule or trained model.
 18. The apparatus of claim 17, wherein the processing circuitry is configured to modify based on a rule or trained model by modifying the transparency, the shape, the color, or the opacity.
 19. The apparatus of claim 17, wherein the processing circuitry is configured to modify based on a rule or trained model by modifying the transparency.
 20. The apparatus of claim 17, wherein the layer and the different layer are of substantially the same dimensions and associated with different sources. 